Construction of a ceRNA network to reveal a vascular invasion associated prognostic model in hepatocellular carcinoma

Abstract The aim of this study is to explore the prognostic value of vascular invasion (VI) in hepatocellular carcinoma (HCC) by searching for competing endogenous RNAs (ceRNA) network and constructing a new prognostic model for HCC. The differentially expressed genes (DEGs) between HCC and normal tissues were identified from GEO and TCGA. StarBase and miRanda prediction tools were applied to construct a circRNA-miRNA-mRNA network. The DEGs between HCC with and without VI were also identified. Then, the hub genes were screened to build a prognostic risk score model through the method of least absolute shrinkage and selection operator. The prognostic ability of the model was assessed using the Kaplan−Meier method and Cox regression analysis. In result, there were 221 up-regulated and 47 down-regulated differentially expressed circRNAs (DEcircRNAs) in HCC compared with normal tissue. A circRNA-related ceRNA network was established, containing 11 DEcircRNAs, 12 DEmiRNAs, and 161 DEmRNAs. Meanwhile, another DEG analysis revealed 625 up-regulated and 123 down-regulated DEGs between HCC with and without VI, and then a protein–protein interaction (PPI) network was built based on 122 VI-related DEGs. From the intersection of DEGs within the PPI and ceRNA networks, we obtained seven hub genes to build a novel prognostic risk score model. HCC patients with high-risk scores had shorter survival time and presented more advanced T/N/M stages as well as VI occurrence. In conclusion a novel prognostic model based on seven VI-associated DEGs within a circRNA-related ceRNA network was constructed in this study, with great ability to predict the outcome of HCC patients.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common malignant tumors and the fourth leading cause of cancerrelated deaths worldwide.The highest incidence rates of HCC in the world are reported in Asia and Africa.Although Mongolia has the highest incidence (93.7 per 100,000), China has the largest number of HCC patients, due to both a relatively high incidence (18.3 per 100,000) and the world's largest population (1.4 billion persons).Though the great advances in diagnosis and therapy, the prognosis of HCC patients remains poor, with mortalities approximating incidence rates worldwide [1][2][3].The average 5 year survival rate of HCC patients in the US was 19.6%, but it was only 2.5% for advanced HCC patients, and the poor prognosis was closely related to tumor metastasis [4].Vessel is an important pathway for cancer cells to invade other organs, and angiogenesis is essential for tumor growth and metastasis.After the cancer cells penetrate the microvessels, they mainly contact the basement membrane and bind to matrix proteins through special membrane receptors, such as certain integrin receptors on the surface of cancer cells binding to laminin in the matrix, thus entering the metastatic tissues.Vascular invasion (VI) includes macrovascular invasion or microvascular invasion, and represents the aggressive nature of the spread of the tumor cells.Invasion of the hepatic venous tributaries can cause systemic metastasis, while invasion of portal venous may result in intrahepatic spread of the tumor cells [5].Evidence showed that VI was an independent prognostic indicator for HCC [6].The presence of VI was an unfavorable prognostic factor of HCC recurrence and overall survival (OS) [7].Currently, the specific mechanism of VI and metastasis of HCC is still not well understood.Thus, to better understand the molecular mechanism underlying VI of HCC is vital for assessing the risk of HCC metastasis and patients' survival [8].In addition, by studying the mechanism of VI, molecularly targeted therapies can be used to achieve anticancer effects, such as accurate and targeted attack on tumor cells, inhibition of angiogenesis and metastasis of cancer cells, and reversal of multidrug resistance.
circRNAs are a class of non-coding RNAs that are widely expressed in mammalian cells.circRNAs were reported to regulate many cellular processes including tumor initiation and progression [9].Recently, a large number of circRNAs have been demonstrated to play important roles in HCC development via involvement in the competing endogenous RNA (ceRNA) network.For example, circ_0067835 was found to be elevated in HCC and could promote cell proliferation and metastasis via miR-1236-3p/Twist2 axis [10].circRASSF5 was reported as a tumor suppressor in HCC, which could competitively sponge miR-331-3p and thus enhance the tumor inhibitory effect of PH domain and leucine rich repeat protein phosphatase [11].circ0003998 was shown to act as a ceRNA of miR-143-3p to relieve the repressive effect on FOSL2, an EMT-related stimulator, thus promoting HCC metastasis [12].
Currently, TCGA and GEO databases were widely used to screen differentially expressed genes (DEGs) to build prognostic signatures for assessing the survival of HCC patients.One of the important screening strategies is to select biological hub genes within the ceRNA network.For instance, Chen et al. [13] constructed a lncRNA-related ceRNA regulatory network for HCC using DEGs from TCGA database, and then 11 lncRNAs within the ceRNA network were selected to build a prognostic signature.The multivariate Cox regression analysis showed that the prognostic signature could be an independent indicator for HCC patients' survival.Zhang et al. [14] reported a differential lncRNA-miRNA-mRNA regulatory network in HCC based on TCGA-LIHC data and three key lncRNAs were eventually screened to construct a prognostic signature for OS.Similarly, Huang et al. [15] constructed a ceRNA network comprising 44 DEmRNAs, 7 DElncRNAs, and 20 DEmiRNAs in hepatitis B virus (HBV)-related HCC, and then established a 7-lncRNA signature, which was finally verified as a potential prognostic predictor for HBV-related HCC patients.Notably, as VI has been identified to be a prognostic index for HCC patients' survival, a VI-related ceRNA network and a 8-lncRNA prognostic model were subsequently constructed by Tao et al. and time-dependent ROC analysis verified the utility of the model to predict the clinical outcomes of HCC patients [16].Currently, several circRNA-related ceRNA networks were constructed for HCC.However, there was no prognostic model for HCC built based on the hub genes in relation to VI and circRNA-related ceRNA network simultaneously so far.
Here we sought to establish a novel prognostic signature for HCC by selecting VI-related hub genes within the circRNA-related ceRNA network.In this study, we first explored the DEGs between HCC and normal tissue through GEO and TCGA databases.Then, we constructed a circRNA-miRNA-mRNA network and investigated the biological functions of these HCC-related DEGs.Additionally, the DEGs between HCC with and without VI were also screened to build a protein-protein interaction (PPI) network.After making the intersection of DEGs within PPI and ceRNA networks, we obtained seven hub genes to construct a novel prognostic risk score model.The prognostic value of the model for HCC patients' survival was further investigated.Finally, the immune cell infiltration in HCC was assessed according to the risk scores from the prognostic model.

Data acquisition and processing
We downloaded the circRNA expression dataset GSE94508 [17] and GSE97332 [18] containing human HCC and adjacent normal tissues from GEO (https://www.ncbi.nlm.nih.gov/geo/)database.Two sets of data are from platform GPL19978 and our search terms were HCC and circRNA.There are 10 samples in GSE94508, which are divided into 5 tumor tissues and 5 tumor adjacent normal tissues, and 14 samples in GSE97332, consisting of 7 tumor tissues and 7 tumor adjacent normal tissues.The primary data were processed by background correction and quantile normalization, and the integrated circRNA expression data were obtained after batch effect removal using R package sva [19].The ComBat function in R packet sva [19] was used to integrate multiple data and remove the batch effect.We used TCGAbiolinks package [20] to download HCC (Liver hepatocellular carcinoma, TCGA-LIHC)-related gene expression data from the TCGA database (https://portal.gdc.cancer.gov/),which contained 371 HCC tissues and 50 tumor adjacent normal tissues.Meanwhile, TCGAbiolinks package [20] was used to obtain the corresponding clinicopathological survival information of 371 tumor samples, including age, survival status, follow-up time, VI, stage, and so on.In addition, the gene expression data of normal liver tissue (GTEX-Liver) were downloaded from GTEX database, which contained 110 normal samples.Two datasets of TCGA-LIHC and GTEX-Liver were combined to extract the expression data of common genes, and the batch effect between different datasets was corrected by ComBat function in R package sva [19].The combined data included 371 HCC tumor samples and 160 control samples.Gene expression data included miRNA and mRNA expression levels of each patient, and clinical information included age, sex, pathological stage, VI, survival status, and total survival time.Only patients with complete survival information and gene expression data were included in this study.Finally, a total of 358 patients were selected for further analysis.

Screening of DEGs
In order to assess the difference of gene expression between HCC and normal tissue, R packet limma [21] was used to analyze the difference among different groups.|logFC| > 1 and P value < 0.05 were set as thresholds for differentially expressed circRNAs (DEcircRNAs), where circRNAs with logFC > 1 were considered significantly up-regulated, and circRNAs with logFC < −1 were considered down-regulated.For differentially expressed mRNAs (DEmRNAs), we set the threshold value as |logFC| > 2 and P value < 0.05, where DEmRNAs with logFC < −2 were down-regulated and DEmRNAs with logFC > 2 were up-regulated.Similarly, |logFC| > 1 and P value < 0.05 were used to identify differentially expressed miRNAs (DEmiRNAs).The volcano map was used to show the up and down-regulated DEGs, and pheatmap R-package [22] was used to draw the heat map of these DEGs in all samples.

Construction of a ceRNA network
To better understand the effect of circRNAs on the progression of HCC, a ceRNA network was built based on DEcircRNAs, DEmiRNAs, and DEmRNAs.The human sequences of DEcircRNAs and DEmiRNAs were downloaded from circBase (http://www.circbase.org/)and miRBase (version 21; http://www.mirbase.org/)databases, respectively.The miRanda prediction tool was used to predict the interaction between DEcircRNAs and DEmiRNAs.In addition, the potential mRNAs targeted by DEmiRNAs were obtained from StarBase (http://starbase.sysu.edu.cn/)database, which provided prediction results from seven prediction programs (TargetScan, microT, miRmap, picTar, RNA22, PITA, and miRanda).If the interaction between miRNA and mRNA was predicted in not less than four programs, it was selected for next analysis.Then, we overlapped the target mRNAs with the DEmRNAs mentioned above.The nodes that could not form circRNA-miRNA-mRNA interaction relationship were removed, and finally a ceRNA network was established and visualized by the Cytoscape software (version 3.7.0/www.cytoscape.org).

Functional enrichment analysis
Gene Ontology (GO) functional annotation analysis is a common method for large-scale functional enrichment of genes [23], including biological processes (BPs), molecular functions (MFs), and cellular components (CC).Kyoto Encyclopedia of Genes and Genomes (KEGG) is a widely used database for storing information about genomes, biological pathways, and drugs [24].In order to study the BPs that the ceRNA network might participate in, we chose the DEmRNAs within the ceRNA network for GO functional annotation analysis and KEGG pathway enrichment analysis by using clusterProfiler R software package [25].P value < 0.05 was considered statistically significant.

Gene expression difference between HCC with and without VI
According to the clinical information of patients with HCC, we divided patients into two groups according to HCC with or without VI; 236 patients with HCC marked "None" without VI, and 122 patients with HCC labeled "Micro" and "Macro" with VI.In order to explore the difference between HCC with and without VI, limma R-packet was used to analyze the DEGs between the two groups.|logFC| > 1 and Padj < 0.05 were set as the threshold of DEGs.The genes of logFC > 1 and Padj < 0.05 were regarded as the up-regulated DEGs, and the genes of logFC < −1 and Padj < 0.05 were down-regulated.The volcano map was used to show the up-regulated and down-regulated DEGs, and the pheatmap R-package drew the heat map of these DEGs in all samples.

Construction of PPI network
PPI network is composed of individual proteins through the interaction between each other, which participates in various life processes such as biological signal transmission, gene expression regulation, and cell cycle regulation.PPI analysis is very important for elucidating the molecular mechanism of key cellular activities in carcinogenesis.
Based on the DEGs between HCC with and without VI, the STRING database (https://string-db.org/)was used to evaluate the PPI information and further construct the PPI network [20].Taking the interaction score of 0.4 as the cut-off value, the PPI network was visualized by cytoscape software.

Construction of prognostic model and analysis of prognosis
Based on the intersection of DEGs within the PPI and ceRNA network mentioned above, we constructed a prognostic risk score model through the method of least absolute shrinkage and selection operator (LASSO) to realize the risk stratification of HCC patients.LASSO regression is a commonly used variable selection method when fitting high-dimensional generalized linear models.The glmnet package [26] in R was used to execute the LASSO algorithm.The risk scoring model was established by combining the regression coefficient with the corresponding gene expression value.After calculating the risk score, taking the median risk score as the cut-off point, the patients in each cohort were divided into low-and high-risk group accordingly.Univariate and multivariate Cox analyses were used to analyze the ability of risk score in combination with clinicopathological features to predict the OS.Then, the risk score model and clinicopathological parameters were used for further analysis through the clinical predictive Nomogram, which was constructed by rms R-packet [27].Decision curve analysis (DCA) was performed, and clinical impact curves were drawn to evaluate whether the model-based decisions were beneficial to patients.

Difference of immune cell infiltration between high-and low-risk groups
The immune microenvironment is mainly composed of immune cells, inflammatory cells, fibroblasts, endothelial cells, bone marrow-derived cells, extracellular matrix, cytokines, and chemokines, which is a comprehensive system.The analysis of immune cells infiltration in the clinical samples is of great importance to discern the mechanisms underlying cancer progression and predict prognosis.Single sample gene cluster enrichment analysis (ssGSEA) is an extension of GSEA method.ssGSEA algorithm was used to calculate the content of 28 kinds of immune cells in high-and low-risk groups [28].The composition of immune cells in high and low risk groups was visualized by box map.

Statistical analysis
All statistical analyses were carried out in R language (https://www.r-project.orgversion 4.0.2).For the comparison of continuous variables in two groups, the statistical significance of normal distribution variables was estimated by independent t-test, the differences between non-normal distribution variables were analyzed by Wilcoxon rank sum test, and the differences between multiple groups of independent variables were analyzed by Kruskal−Wallis Test.The Kaplan −Meier (KM) method was used to evaluate the difference in survival time of patients with HCC, and the logarithmic rank test was used to determine the statistical significance of the observed differences between distinct groups.The hazard ratio and 95% confidence interval were calculated based on Cox regression analysis.All the statistical P values were bilateral, P < 0.05 was considered statistically significant.
Ethics approval and consent to participate: TCGA and GEO both are public databases.The patients involved in the database have given their approval in the original studies.Users can download relevant data for research and publish relevant articles.Our study is based on these opensource data, so there are no ethical issues.

Identification of DEGs
The schematic diagram of our analysis strategy is shown in Figure A1.To analyze the difference of gene expression between HCC and normal tissue, we first analyzed the mRNA expression data.We used limma differential analysis to get 6,674 DEmRNAs, including 4,446 up-regulated DEmRNAs and 2,228 down-regulated DEmRNAs.Using DEmRNAs and data grouping information to draw a classification heat map, DEmRNAs could well distinguish HCC from normal tissue (Figure 1a and b).For miRNA expression data, we used limma differential analysis to get 100 DEmiRNAs, including 33 up-regulated DEmiRNAs and 67 down-regulated DEmiRNAs.Using DEmiRNAs and data grouping information to draw a classification heat map, DEmiRNAs could well differentiate HCC from normal tissue (Figure 1c and d).
For the integrated circRNA expression data, we used limma difference analysis to get 211 up-regulated DEcircRNAs and 47 down-regulated DEcircRNAs.Using DEcircRNAs and data grouping information to draw a classification heat map, DEc-ircRNAs could also well discriminate HCC from normal tissue (Figure 1e and f).

Construction of the ceRNA network based on HCC-related DEGs
We used the StarBase database to identify the DEmRNAs targeted by DEmiRNAs.A total of 1,059 miRNA-mRNA interactions were predicted, including 51 DEmiRNAs and 316 DEmRNAs (Figure 2a).Using the miRanda prediction tool, 36 circRNA-miRNA interactions were predicted, based on 19 DEcircRNAs and 25 DEmiRNAs (Figure 2b).After integrating the circRNA-miRNA interaction with the miRNA-mRNA interaction and removing the nodes that cannot form circRNA-miRNA-mRNA interaction, a novel HCC-related ceRNA network was established.The network consisted of 11 DEcircRNAs, 12 DEmiRNAs, and 161 DEmRNAs (Figure 2c).A novel VI-associated prognostic model in HCC  5

Functional enrichment analysis of DEmRNAs
In order to explore the biological functions of these HCCrelated DEmRNAs, we performed the functional enrichment analysis (Figure 3a and Tables 1 and 2) HCC-related DEmRNAs were mainly enriched in BPs associated with axonogenesis, skeletal system development, mesenchyme development, epithelial cell proliferation, and protein kinase B signaling (Figure 3b).At the same time, it was enriched in CCs such as filopodium, transcription regulator complex, neuron to neuron synapse, neuronal cell body, and postsynaptic density (Figure 3c) and MFs such as DNA-binding transcription activator activity, RNA polymerase II-specificity, DNA-binding transcription activator activity, beta-catenin binding, chemorepellent activity, and growth factor binding (Figure 3d).Then, the pathway enrichment analysis was carried out, and the results showed that HCC-related DEmRNAs were enriched in biological pathways such as breast cancer, melanoma, transcriptional mis-regulation in cancer, p53 signaling pathway, cell cycle, glioma, and so on (Figure 3e). Figure 3f shows the most significant enrichment pathway: hsa05224: breast cancer (Figure 3F).Here we found that it was intriguing to observe the enrichment of a pathway for a different cancer type.Therefore, we compared the DEGs between breast cancer and HCC from TCGA database, and A novel VI-associated prognostic model in HCC  7 found that there were commonalities between these two cancer types (Figure A2).

Screening hub genes in relation to VI
A total of 748 DEGs were obtained by differential expression analysis between HCC with VI and those without VI, including 625 up-regulated and 123 down-regulated DEGs (Figure 4a).We screened the proteins encoded by these DEGs to construct a PPI network visualized in Cytoscape (Figure 4b).There are 122 DEGs and 278 PPI pairs in the PPI network.The top five genes that interact with other DEGs are FOS and NRXN1 (interacting with 18 DEGs), CCNA2, ESR1, and NTRK2 (interacting with 17 DEGs).We overlapped the DEGs within the PPI and ceRNA network mentioned above to get seven genes as hub genes (Figure 4c).We used the R packet "GOSemSim" to calculate the GO semantic similarity of these seven hub genes.The results showed that there was a high correlation among HOXC10, HOXC11, HOXC8, and HOXD10 (Figure 4d).Furthermore, the gene expression levels of seven hub genes were screened from two datasets of TCGA-LIHC and GTEX-Liver, and visualized by R-packet ggplot2 box map.The results showed that there were significant differences in the expression of seven hub genes between HCC tumor samples and control samples (Figure A3).

Construction of prognostic risk score model based on hub genes
We used R packet "glmnet" to obtain the regression coefficients of these seven hub genes based on LASSO method.Combined with the gene expression levels of these hub genes, we constructed a risk score model to predict the prognosis of patients with HCC.The regression coefficients of the seven hub genes were as follows: HOXC8: 0.037, GNAO1: 0.017, ADRA1D: 0.020, ARPP21: −0.008, HOXC11: 0.020, HOXC10: −0.014, and HOXD10: 0.047 (Figure 5a and b).We explored the correlation among the expression of these hub genes, and the results showed high levels of correlation among HOXC10, HOXC11, and HOXC8 (Figure 5c).We calculated the risk score for each patient in the TCGA-HCC cohort, and then divided the patients into high and low risk groups according to the median risk score.KM survival analysis showed that OS in the higher risk group was significantly shorter than that in the lower risk group (P < 0.0001, Figure 5d).Next we assessed the relationship between risk score and HCC patient survival.The results showed that the higher the risk score, the shorter the survival time of HCC patient and the higher the proportion of death (Figure 5e).In order to verify the robustness of the risk score, we tested the utility of our model in the validation set data.The verification set data were from the Chinese HCC patients with HBV infection (CHCC-HBV) cohort from 2010 to 2014 at Zhongshan Hospital [29], which was accessed through NODE (https://www.biosino.org/node).Combined with the survival data of patients and our model, the risk score of each patient was calculated, and the patients were grouped and analyzed by risk score.The results showed that the patients with high-risk score in the verification set also showed a trend of worse prognosis (Figure A4).A novel VI-associated prognostic model in HCC  9

Correlation between risk score and clinical parameters
We analyzed the correlation between risk score and clinical parameters, and the results showed that patients in the high-risk group were usually associated with higher T/N/M stages and VI occurrence (Figure 6a).Then, the effects of the expression levels of seven key genes on the prognosis of patients was analyzed, and the results indicated that the high expression of HOXC8, ADRA1D, HOXC11, HOXC10, and showed that risk stratification, T/N/M stage, and ethnicity were all prognostic factors for HCC (Figure 7e).Multivariate Cox analysis showed that risk stratification was the only prognostic factor for HCC (Figure 7f).The DCA curve showed that the model could be beneficial to the prognosis evaluation of patients (Figure 7g).The time-ROC analysis also showed that the 1 year survival prediction performance of the prognostic model was 58%, and both the 3 and 5 year survival prediction performance were 61% (Figure 7h).A novel VI-associated prognostic model in HCC  13

Differences in immune infiltration
After analyzing the immune cell infiltration in HCC, we found that the counts of activated CD4+ T cell, T follicular helper cell, mast cell, type 2T helper cell, activated CD8+ T cell, regulatory T cell, and natural killer T cell in high-risk group were significantly more than that of low-risk group (Figure 8a).Moreover, the correlations between the immune cells content in the high-risk group and the low-risk group were further calculated, and the results showed that the correlation between the immune cells content in the highrisk group was significantly higher than that in the low-risk group (Figure 8b and c).The relationship between hub gene expression and the immune cells of two groups were analyzed and the result showed that there was a noteworthy difference between HOXC11, HOXC8 expression, and immune cells (Figure 8d−e).

Discussion
Nowadays, HCC is still the leading cause of cancer mortalities globally [30].Over the past decades, large numbers of studies had been conducted to explore the molecular mechanisms underlying HCC advancement and many target drugs have been applied to treatment, but patients with metastatic HCC still had poor prognosis [31].Among all the factors, VI stands out due to significant contribution to HCC metastasis, and VI could include angiogenesis and dysfunction of vascular endothelial cell barrier, which provide pathway for cancer cells to invade other organs [32].Previous study revealed that the presence of microvascular invasion means the presence of tumor invasion to the adjacent tissues of HCC.Microvascular invasion was one of the vital elements of early HCC recurrence and distal metastasis [33].Therefore, it is of great importance to investigate the molecular mechanism underlying VI of HCC and explore effective treatment target to improve the prognosis of HCC patients.
Increasing evidence has uncovered the significant involvement of noncoding RNAs (ncRNAs) in the tumorigenesis of HCC.circRNAs are covalently closed loop structure ncRNA with evolutionary conservation and high abundance in eukaryotes.Functionally, circRNAs could compete for miRNA binding and remove the inhibitory effect of miRNA on their target mRNAs, thus forming ceRNA mechanism.In addition to functioning as miRNAs sponges, circRNAs could also interact with RNAbinding proteins, and eventually exert regulatory roles in different BPs, including cell proliferation, cell cycle, apoptosis, migration, EMT invasion, glycolysis, angiogenesis, VI, and metastasis [34].A growing body of studies has demonstrated that circRNAs play important roles in the occurrence and development of HCC and other liver diseases.Most importantly, circRNAs have great clinical value as potential biomarkers and therapeutic target for HCC [35,36].Therefore, more and more attention has been paid to circRNAs in HCC.Very recently, several circRNA-related ceRNA networks were identified in HCC and subsequently the prognostic risk score models were built based on key genes within these ceRNA network.Han et al. reported a ceRNA network composed of DEcircRNAs and then a prognostic risk assessment model was developed based on seven hub genes (PLOD2, TARS, RNF19B, CCT2, RAN, C5orf30, and MCM10), which was verified to be an independent factor for predicting prognosis of HCC [37].Notably, two lncRNArelated ceRNA networks associated with VI have been reported in HCC by Cai et al. [38] and Tao et al. [16], respectively, and the prognostic signatures were also built based on distinct DEGs.However, to our knowledge, the prognostic risk score models for HCC based on hub genes in relation to both VI and circRNA-related ceRNAs have never been reported.In the current study, the differences of gene expression between HCC and adjacent normal tissue were explored first, and then we identified a novel circRNA-miRNA-mRNA network consisting of 12 DEmiRNAs, 11 DEcircRNAs, and 161 DEmRNAs.Previous study demonstrated that a circRNA-miRNA-mRNA network involved in the pathogenesis and therapy strategy of HCC [39], and the DEmiRNAs, DEcircRNAs, and DEmRNAs were all different from our study, it might be explained that the distinct database and biological function were investigated.
In our study, the DEGs between HCC with and without VI were also analyzed and used to build a PPI network.The top five genes that interacted with other DEGs were FOS, NRXN1, CCNA2, ESR1, and NTRK2.The great importance of FOS in HCC has also been reported by others.Study has verified that FOS might be a potential marker for predicting HCC prognostic.The expression level of FOS in HCC patients was inversely related to the OS [40].Furthermore, we made the intersection of the DEGs within the PPI network and the DEGs in the ceRNA network to get seven hub genes.Subsequently, a seven-gene based prognostic risk score model was constructed for HCC.The result of TCGA-HCC cohort showed that the OS of the higher risk group was markedly shorter than that of the lower risk group.These seven hub genes were GNAO1, ADRA1D, ARPP21, HOXC8, HOXC11, HOXC10, and HOXD10.Some of these hub genes have been reported to be implicated in HCC progression.GNAO1 was shown to be significantly downregulated in HCC, as well as being implicated in a variety of intracellular biological events.GNAO1 may act as a tumor suppressor and was a reliable biomarker of relapse prediction for HCC [41,42].A previous bioinformatics analysis revealed that HOXC genes (HOXC8, HOXC9, HOXC10, HOXC11, HOXC12, and HOXC13) might participate in pathogenesis of gastric adenocarcinoma [43].In addition, high expression levels of HOXC genes were significantly correlated with shorter OS of the gastric cancer patients [44].Here we found that HOX genes might play the same roles in HCC, as KM analysis demonstrated that HOXC8, HOXC11, HOXC10, and HOXD10 expressions were all inversely related to HCC patients' OS.To our knowledge, this is the first time to report the expression and the prognostic role of HOXC genes in HCC.
It is generally believed that HCC is preceded by liver damage and extensive inflammation, and therefore is accompanied by immune cells infiltration.Intratumoral immune cell infiltration has been associated with HCC prognosis in previous study [45].HCC is often accompanied with a dense stroma coupled with infiltrated immune cells, referring to as the tumor microenvironment.Populations of infiltrated immune cells, such as CD163+ macrophages and CD8+ T cells, are associated with the prognosis in HCC, and immune cells in the tumor microenvironment can be a target for HCC therapy [46].Here we analyzed the immune cell infiltration in HCC patients according to the risk stratification based on the prognostic model.The counts of several important immune cells, such as Activated CD4+ T cell, Mast cell, Activated CD8+ T cell, and Natural killer T, in highrisk group were all significantly higher than those in lowrisk group, and the correlation of infiltrated immune cells in high-risk group was also significantly higher than that in low-risk group.This indicated that our prognostic model was useful in predicting immune response in HCC.Moreover, our data showed a significant correlation among the expression levels of HOXC11, HOXC8, and immune cells infiltration in HCC, indicating that the HOXC genes might play vital roles in immune response of HCC.
It should be noticed that there are some limits in our study.First, the data sources and any assumptions inherent in the analysis process may enhance the potential bias of the results of this study; therefore, larger sample sizes, more databases, and better algorithms are needed to build a more comprehensive ceRNA network in HCC.Second, HCC tissue sample detection in our research center should be performed to test the clinical utility of the prognostic model.Furthermore, the role of HOXC genes in HCC progression should be verified in vitro and in vivo experiments, which might explore the underling molecular mechanism for HCC metastasis.
Furthermore, we have also noticed the technical limitations and uncertainties in applying our model to real-world scenarios.Liver cancer is a complex disease with various unknown factors and variables influencing its progression and treatment.Therefore, our model's predictions may carry a certain degree of uncertainty, particularly in complex clinical cases.We believe it is crucial to highlight this aspect to ensure transparent communication and informed decision-making by healthcare professionals.

Conclusion
Here we established a novel prognostic model based on seven hub genes, which were screened from the intersection of DEGs within a VI-related PPI network and a circRNArelated ceRNA network.The seven-gene based prognostic model was useful for evaluating the prognosis of HCC patients.This study clarified for the first time that the abnormal expression and the prognostic roles of HOXC genes (HOXC10, HOXC11, and HOXC8) in HCC.Additionally, the expression of HOXC11 and HOXC8 were correlated with immune cell infiltration, suggesting that they might be potential immunotherapy targets for HCC.Further studies are needed to verify these results in vitro and in vivo.

Figure 2 :
Figure 2: Construction of the ceRNA network.(a) miRNA-mRNA interaction network, green node represents miRNA and blue node represents mRNA.(b) circRNA-miRNA interaction network, green node represents miRNA and blue node represents circRNA.(c) circRNA-miRNA-mRNA interaction network, green node represents miRNA, orange node represents circRNA, and blue node represents mRNA.

Figure 3 :
Figure 3: GO and KEGG enrichment analysis.(a) GO function enrichment analysis, ordinate was log10 (P value), abscissa was GO terms, node color represented z-score.(b)−(d) The top five items of BP, CC, and MF showed that the node size indicated the number of genes contained in the current GO term, the color of the line represents different GO terms, and the node color represents the FC of the gene.(e) KEGG pathway enrichment analysis showed that the node color represents the FC of the gene, and the line color represents different KEGG pathways.(f) Significantly enriched KEGG pathway, hsa05224: Breast cancer.

Figure 4 :
Figure 4: PPI network of DEGs between HCC with and without VI.(a) The DEGs between HCC with and without VI were analyzed, the abscissa was log2 fold change, the ordinate was log10 (adjust P-value), the red node was up-regulated DEGs, the blue node was down-regulated DEGs, and the grey node was non-significant DEGs.(b) The PPI network constructed by the encoded proteins of DEGs, and the orange color represents the key genes.(c) The Venn diagram of genes in PPI and DEmRNA in ceRNA network, the orange color represents the DEmRNA in ceRNA network, and green color represents the genes in PPI network.(d) The GO semantic similarity of key genes, the horizontal axis represents similarity, and the vertical axis was the genes.

Figure 5 :
Figure 5: Construction of risk scoring model.(a) LASSO regression characteristic fitting curve.(b) Quantitative analysis of key genes in LASSO regression.(c) Correlation analysis of expression levels of hub genes.(d) Survival analysis between patients in high-and low-risk groups.Orange color indicated patients in low-risk group, green color indicated patients in high-risk group, the horizontal axis was survival time, and the vertical axis was survival probability.(e) Correlation diagram of the risk factors in risk scoring model.The vertical axis in the top figure was the risk score, the horizontal axis was the patient, the vertical axis in the middle figure was the survival time of the patient, and the horizontal axis was the patient.The lower figure is the heat map of the expression levels of key genes, orange color indicats high expression and green color indicates low expression.

Figure 6 : 11 Figure 7 :
Figure 6: The prognostic relevance of seven characteristic gene.(a) The correlation between risk score, clinical parameter, and key gene expression level.(b)−(f) The survival curves show the effects of HOXC8, ADRA1D, HOXC11, HOXC10, and HOXD10 gene expression on the prognosis of patients.

Figure 8 :
Figure 8: Immune infiltration analysis.(a) The difference of immune cell infiltration between high-risk group and low-risk group, green color indicates high-risk group, orange color indicates low-risk group.(b) and (c) There was a correlation between the immune cell infiltration in the high-risk group and the low-risk group, with blue color indicating a positive correlation and red indicating a negative correlation.(d) and (e)The correlation analysis between immune cell infiltration and hub genes in high-risk group and low-risk group shows that the horizontal axis is immune cell, the vertical axis is key gene, and the node size indicated significant level.The node color represented the correlation level ratio.

Figure A3 :
Figure A3: (a) TCGA-LIHC and GTEX-Liver have obvious batch effect.(b) TCGA-LIHC and GTEX-Liver data are integrated after batch effects removal.(c) There was a significant difference in the expression of seven characteristic genes between hepatocellular carcinoma tumor samples and control samples.* it means P < 0.05, ** means P < 0.01, *** means P < 0.001.

Figure A4 :
Figure A4:The patients with high risk scores in the validation set of CHCC data also showed a tendency to have a worse prognosis.

Table 1 :
GO analysis of DEmRNAs

Table 2 :
KEGG analysis of DEmRNAs